Switch from INSERT INTO to COPY FROM for postgres#43
Merged
brycekbargar merged 4 commits intolibrary-data-platform:release-v3.2.0from Sep 17, 2025
Merged
Conversation
63e9c88
into
library-data-platform:release-v3.2.0
5 checks passed
brycekbargar
added a commit
that referenced
this pull request
Sep 18, 2025
I wasn't super happy with the weird type shenanigans happening in the last MR #43 but thought that doing a loads/dumps on all the source records to keep it consistent would be too slow. I discovered orjson.Fragment which means we can just use dumps and treat both srs and non-srs as bytes. This simplified the signatures and type handling quite a bit. Streaming support for SRS was rushed because non-streaming became even more unstable under Ramsons. There was a chance (though probably small) that someone was loading source-storage endpoints that weren't the source-records one and they would have broken. This adds more consistency around which endpoints we do support with streaming and fixes any accidental breaks.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Using COPY FROM is an order of magnitude faster for bulk insertions for postgres. This is the last low-hanging fruit for download performance until async/await is supported and optimized for in a future future release. DuckDB can kind of sort of do bulk operations but it isn't a priority right now, especially with sqlite still supported in this release.
Getting bytes do go directly into the database was a bit of a struggle but postgres expects a "version" byte at the beginning of any record. By doing the byte munging ourselves we skip a whole lot of conversion in ldlite and conversion/logic in psycopg which would have added points of failure and slowness. I'm going to be testing to make sure it works across 5C FOLIO data before releasing the change.